Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Universal parser] Introduce rb_parser_ary_t #10004

Merged
merged 1 commit into from Mar 12, 2024

Conversation

hasumikin
Copy link
Contributor

@hasumikin hasumikin commented Feb 17, 2024

This PR is part of the work to make Lrama-generating-parse.c a universal parser

  • Introduce rb_parser_ary_t struct to partly eliminate RArray from parse.y
  • parser_params->tokens and parser_params->ast->node_buffer->tokens becomes rb_parser_ary_t *
  • Instead, ast_node_all_tokens() internally creates a Ruby Array object from the rb_parser_ary_t *tokens

@hasumikin hasumikin changed the title []Introduce rb_parser_ary_t [Universal parser] Introduce rb_parser_ary_t Feb 17, 2024
node.c Outdated
@@ -483,7 +511,7 @@ rb_ast_delete_mark_object(rb_ast_t *ast, VALUE obj)
VALUE
rb_ast_tokens(rb_ast_t *ast)
{
return ast->node_buffer->tokens;
return (VALUE)ast->node_buffer->tokens;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is rb_parser_ary_t compatible with VALUE?
This function is used in the implementation of RubyVM::AbstractSyntaxTree::Node#all_tokens, and the return value will be seen in Ruby level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reimplemented rb_ast_tokens() that constructs RArray from rb_parser_ary_t 8a986a2

parse.y Outdated
p->token_id++;

if (p->debug) {
rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, ary);
rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, (VALUE)token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRsVALUE calls to_s method on the corresponding argument.
rb_parser_ast_token_t does not seem to compatible with VALUE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!
I implemented rb_parser_str_escape() which does not depend on VALUE
15e42bc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobu Could you please review them again?

@hasumikin hasumikin force-pushed the rb_parser_ary branch 2 times, most recently from 4cad33b to 8a986a2 Compare February 24, 2024 07:05
parse.y Outdated
ary->data = xrealloc(ary->data, sizeof(void *) * len);
}
for (i = ary->len; i < len; i++) {
ary->data[i] = Qnil;
Copy link
Contributor

@yui-knk yui-knk Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not Qnil but 0 (or NULL)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse.y Outdated
ary->data = NULL;
}
for (i = 0; i < len; i++) {
ary->data[i] = Qnil;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not Qnil but 0 (or NULL)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse.y Outdated
rb_parser_ary_push(rb_parser_t *p, rb_parser_ary_t *ary, VALUE val)
{
if (ary->len == ary->capa) {
rb_parser_ary_extend(p, ary, ary->len + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding the array with ary->len + 1 is no efficient especially for ast token array because it's expected to be expanded again and again. How about to double the capacity?

Copy link
Contributor Author

@hasumikin hasumikin Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: hasumikin@d1fd5ba
BTW, what does the comment // TODO in rb_ruby_parser_keep_tokens() mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might to be "Remove dependency for Ruby Array", then please remove the // TODO comment in this PR, thank you.

Copy link
Contributor Author

@hasumikin hasumikin Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the line hasumikin@968c063

node.c Outdated
parser_ast_token_free(rb_ast_t *ast, rb_parser_ast_token_t *token)
{
if (!token) return;
parser_string_free(ast, (rb_parser_string_t *)token->str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cast for rb_parser_string_t * needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.c Outdated
@@ -483,7 +523,32 @@ rb_ast_delete_mark_object(rb_ast_t *ast, VALUE obj)
VALUE
rb_ast_tokens(rb_ast_t *ast)
{
return ast->node_buffer->tokens;
long i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to move this ruby object construction logic to ast.c? By the change, node.c no need to handle VALUE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought that too but,
If we move the logic from node.c to ast.c, do you also think we should redesign ast->node_buffer->tokens into ast->tokens because we can say the tokens member no longer belongs to a Node?
(Of course, we cannot change the Ruby level design: AbstractSyntaxTree::Node though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a long term, it's correct however It's impossible right now. rb_ast_t is imemo (internal ruby object) now therefore it's size should be 5 words. It's already uses 5 words like below. We can refactor it once rb_ast_t is not managed as imemo.

typedef struct rb_ast_body_struct {
    const NODE *root;
    VALUE script_lines;
    // script_lines is either:
    // - a Fixnum that represents the line count of the original source, or
    // - an Array that contains the lines of the original source
    signed int frozen_string_literal:2; /* -1: not specified, 0: false, 1: true */
    signed int coverage_enabled:2; /* -1: not specified, 0: false, 1: true */
} rb_ast_body_t;
typedef struct rb_ast_struct {
    VALUE flags;
    node_buffer_t *node_buffer;
    rb_ast_body_t body;
} rb_ast_t;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see

Copy link
Contributor Author

@hasumikin hasumikin Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rb_obj_freeze(tokens);
rb_ast_set_tokens(p->ast, tokens);
p->ast->node_buffer->tokens = tokens;
p->tokens = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add free for rb_ruby_parser_free just in case, if an exception happens in the middle of parsing, this line can be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this OK? hasumikin@c2d3fc8 (I amend a function name parser_ary_free() to parser_token_free() for clarity)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser_token_free is preferred, BTW the existence of ast and ast->node_buffer is confirmed by rb_ast_free. Therefore it's more simple to call parser_tokens_free directly in rb_node_buffer_free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Fixed hasumikin@e9e3ac6

parse.y Outdated
p->token_id++;

if (p->debug) {
rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, ary);
rb_parser_printf(p, "Append tokens (line: %d) [%d, :%s, \"%s\", [%d, %d, %d, %d]]\n",
line, token->id, token->type_name, rb_parser_str_escape(p, token->str)->ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess rb_parser_string_t allocated in rb_parser_str_escape is not freed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in hasumikin@edd7c6d

@hasumikin hasumikin force-pushed the rb_parser_ary branch 2 times, most recently from 10e95cd to e9e3ac6 Compare February 29, 2024 08:41
@hasumikin hasumikin marked this pull request as ready for review February 29, 2024 09:43
parse.y Outdated Show resolved Hide resolved
parse.y Outdated
Comment on lines 2624 to 2626
for (i = 0; i < len; i++) {
ary->data[i] = 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xcalloc zero-fills as well as the standard calloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Fixed

rubyparser.h Outdated
* Array-like object for parser
*/
typedef struct rb_parser_ary {
uintptr_t *data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this data is a pointer to uintptr_t instead of rb_parser_ast_token_t?
Also val of rb_parser_ary_push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to have rb_parser_ary store elements of arbitrary types. However, after talking to Kaneko-san, we decided to limit it to the rb_parser_ast_token_t* type for now.
85ff09c
Thank you for the review!

@hasumikin hasumikin force-pushed the rb_parser_ary branch 2 times, most recently from b5fffa1 to f12fb8d Compare March 6, 2024 01:51
- Introduce `rb_parser_ary_t` structure to partly eliminate RArray from parse.y
  - In this patch, `parser_params->tokens` and `parser_params->ast->node_buffer->tokens` are now `rb_parser_ary_t *`
  - Instead, `ast_node_all_tokens()` internally creates a Ruby Array object from the `rb_parser_ary_t`
  - Also, delete `rb_ast_tokens()` and `rb_ast_set_tokens()` in node.c

- Implement `rb_parser_str_escape()`
  - This is a port of the `rb_str_escape()` function in string.c
  - `rb_parser_str_escape()` does not depend on `VALUE` (RString)
  - Instead, it uses `rb_parser_stirng_t *`
  - This function works when --dump=y option passed

- Because WIP of the universal parser, similar functions like `rb_parser_tokens_free()` exist in both node.c and parse.y. Refactoring them may be needed in some way in the future

- Although we considered redesigning the structure: `ast->node_buffer->tokens` into `ast->tokens`, we leave it as it is because `rb_ast_t` is an imemo. (We will address it in the future)
@yui-knk yui-knk merged commit 9a19cfd into ruby:master Mar 12, 2024
97 checks passed
@yui-knk
Copy link
Contributor

yui-knk commented Mar 12, 2024

Thank you!

@hasumikin hasumikin deleted the rb_parser_ary branch March 12, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants